A tool - box for hidden Markov models with two novel , memory efficient parameter training algorithms

نویسندگان

  • Yin Tin
  • Lam
چکیده

Hidden Markov models (HMMs) are powerful statistical tools for biological sequence analysis. Many recently developed Bioinformatics applications employ variants of HMMs to analyze diverse types of biological data. It is typically fairly easy to design the states and the topological structure of an HMM. However, it can be difficult to estimate parameter values which yield a good prediction performance. As many HMM-based applications employ similar algorithms for generating predictions, it is also time-consuming and error-prone to have to re-implement these algorithms whenever a new HMM-based application is to be designed. This thesis addresses these challenges by introducing a toolbox , called HMMConverter, which only requires an XML-input file to define an HMM and to use it for sequence decoding and parameter training. The package not only allows for rapid proto-typing of HMM-based applications, but also incorporates several algorithms for sequence decoding and parameter training, including two new, linear memory algorithms for parameter training. Using this software package, even users without programming knowledge can quickly set up sophisticated HMMs and pair-HMMs and use them with efficient algorithms for parameter training and sequence analyses. We use HMMConverter to construct a new comparative gene prediction program, called Annotaid, which can predict pairs of orthologous genes by integrating prior information about each input sequence probabilistically into the gene prediction process and into parameter training. Annotaid can thus be readily adapted to predict orthologous gene pairs in newly sequenced genomes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HMMConverter 1.0: a toolbox for hidden Markov models

Hidden Markov models (HMMs) and their variants are widely used in Bioinformatics applications that analyze and compare biological sequences. Designing a novel application requires the insight of a human expert to define the model's architecture. The implementation of prediction algorithms and algorithms to train the model's parameters, however, can be a time-consuming and error-prone task. We h...

متن کامل

An Adaptive Approach to Increase Accuracy of Forward Algorithm for Solving Evaluation Problems on Unstable Statistical Data Set

Nowadays, Hidden Markov models are extensively utilized for modeling stochastic processes. These models help researchers establish and implement the desired theoretical foundations using Markov algorithms such as Forward one. however, Using Stability hypothesis and the mean statistic for determining the values of Markov functions on unstable statistical data set has led to a significant reducti...

متن کامل

Efficient Parallel Learning of Hidden Markov Chain Models on SMPs

Quad-core cpus have been a common desktop configuration for today’s office. The increasing number of processors on a single chip opens new opportunity for parallel computing. Our goal is to make use of the multi-core as well as multi-processor architectures to speed up large-scale data mining algorithms. In this paper, we present a general parallel learning framework, Cut-And-Stitch, for traini...

متن کامل

مدل یابی انتشار بیماری های عفونی بر اساس رویکرد آماری بیز

Background and Aim: Health surveillance systems are now paying more attention to infectious diseases, largely because of emerging and re-emerging infections. The main objective of this research is presenting a statistical method for modeling infectious disease incidence based on the Bayesian approach.Material and Methods: Since infectious diseases have two phases, namely epidemic and non-epidem...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008